224 research outputs found

    Image-based Recommendations on Styles and Substitutes

    Full text link
    Humans inevitably develop a sense of the relationships between objects, some of which are based on their appearance. Some pairs of objects might be seen as being alternatives to each other (such as two pairs of jeans), while others may be seen as being complementary (such as a pair of jeans and a matching shirt). This information guides many of the choices that people make, from buying clothes to their interactions with each other. We seek here to model this human sense of the relationships between objects based on their appearance. Our approach is not based on fine-grained modeling of user annotations but rather on capturing the largest dataset possible and developing a scalable method for uncovering human notions of the visual relationships within. We cast this as a network inference problem defined on graphs of related images, and provide a large-scale dataset for the training and evaluation of the same. The system we develop is capable of recommending which clothes and accessories will go well together (and which will not), amongst a host of other applications.Comment: 11 pages, 10 figures, SIGIR 201

    Generalized Many-Way Few-Shot Video Classification

    Get PDF
    Few-shot learning methods operate in low data regimes. The aim is to learn with few training examples per class. Although significant progress has been made in few-shot image classification, few-shot video recognition is relatively unexplored and methods based on 2D CNNs are unable to learn temporal information. In this work we thus develop a simple 3D CNN baseline, surpassing existing methods by a large margin. To circumvent the need of labeled examples, we propose to leverage weakly-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities, yielding further improvement. Our results saturate current 5-way benchmarks for few-shot video classification and therefore we propose a new challenging benchmark involving more classes and a mixture of classes with varying supervision

    Unsupervised Learning of Discriminative Relative Visual Attributes

    Full text link

    Graph Matching via Sequential Monte Carlo

    Get PDF
    International audienceGraph matching is a powerful tool for computer vision and machine learning. In this paper, a novel approach to graph matching is developed based on the sequential Monte Carlo framework. By constructing a sequence of intermediate target distributions, the proposed algorithm sequentially performs a sampling and importance resampling to maximize the graph matching objective. Through the sequential sampling procedure, the algorithm effectively collects potential matches under one-to-one matching constraints to avoid the adverse effect of outliers and deformation. Experimental evaluations on synthetic graphs and real images demonstrate its higher robustness to deformation and outliers

    DaMN – Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition

    Full text link
    We propose a method for learning discriminative category-level features and demonstrate state-of-the-art results on large-scale action recognition in video. The key observation is that one-vs-rest classifiers, which are ubiquitously employed for this task, face challenges in separating very similar categories (such as running vs. jogging). Our proposed method automatically identifies such pairs of categories using a criterion of mutual pairwise proximity in the (kernelized) feature space, using a category-level similarity matrix where each entry corresponds to the one-vs-one SVM margin for pairs of categories. We then exploit the observation that while splitting such Siamese Twin categories may be difficult, separating them from the remaining categories in a two-vs-rest framework is not. This enables us to augment one-vs-rest classifiers with a judicious selection of two-vs-rest classifier outputs, formed from such discriminative and mutually nearest (DaMN) pairs. By combining one-vs-rest and two-vs-rest features in a principled probabilistic manner, we achieve state-of-the-art results on the UCF101 and HMDB51 datasets. More importantly, the same DaMN features, when treated as a mid-level representation also outperform existing methods in knowledge transfer experiments, both cross-dataset from UCF101 to HMDB51 and to new categories with limited training data (one-shot and few-shot learning). Finally, we study the generality of the proposed approach by applying DaMN to other classification tasks; our experiments show that DaMN outperforms related approaches in direct comparisons, not only on video action recognition but also on their original image dataset tasks. © 2014 Springer International Publishing

    An Analysis of Errors in Graph-Based Keypoint Matching and Proposed Solutions

    Get PDF
    International audienceAn error occurs in graph-based keypoint matching when key-points in two different images are matched by an algorithm but do not correspond to the same physical point. Most previous methods acquire keypoints in a black-box manner, and focus on developing better algorithms to match the provided points. However to study the complete performance of a matching system one has to study errors through the whole matching pipeline, from keypoint detection, candidate selection to graph optimisation. We show that in the full pipeline there are six different types of errors that cause mismatches. We then present a matching framework designed to reduce these errors. We achieve this by adapting keypoint detectors to better suit the needs of graph-based matching, and achieve better graph constraints by exploiting more information from their keypoints. Our framework is applicable in general images and can handle clutter and motion discontinuities. We also propose a method to identify many mismatches a posteriori based on Left-Right Consistency inspired by stereo matching due to the asymmetric way we detect keypoints and define the graph

    Coherent states on spheres

    Get PDF
    We describe a family of coherent states and an associated resolution of the identity for a quantum particle whose classical configuration space is the d-dimensional sphere S^d. The coherent states are labeled by points in the associated phase space T*(S^d). These coherent states are NOT of Perelomov type but rather are constructed as the eigenvectors of suitably defined annihilation operators. We describe as well the Segal-Bargmann representation for the system, the associated unitary Segal-Bargmann transform, and a natural inversion formula. Although many of these results are in principle special cases of the results of B. Hall and M. Stenzel, we give here a substantially different description based on ideas of T. Thiemann and of K. Kowalski and J. Rembielinski. All of these results can be generalized to a system whose configuration space is an arbitrary compact symmetric space. We focus on the sphere case in order to be able to carry out the calculations in a self-contained and explicit way.Comment: Revised version. Submitted to J. Mathematical Physic

    Time Scale Approach for Chirp Detection

    Get PDF
    International audienceTwo different approaches for joint detection and estimation of signals embedded in stationary random noise are considered and compared, for the subclass of amplitude and frequency modulated signals. Matched filter approaches are compared to time-frequency and time scale based approaches. Particular attention is paid to the case of the so-called " power-law chirps " , characterized by monomial and polynomial amplitude and frequency functions. As target application, the problem of gravitational waves at interferometric detectors is considered

    Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets

    Get PDF
    Automatic discovery of category-specific 3D keypoints from a collection of objects of a category is a challenging problem. The difficulty is added when objects are represented by 3D point clouds, with variations in shape and semantic parts and unknown coordinate frames. We define keypoints to be category-specific, if they meaningfully represent objects’ shape and their correspondences can be simply established order-wise across all objects. This paper aims at learning such 3D keypoints, in an unsupervised manner, using a collection of misaligned 3D point clouds of objects from an unknown category. In order to do so, we model shapes defined by the keypoints, within a category, using the symmetric linear basis shapes without assuming the plane of symmetry to be known. The usage of symmetry prior leads us to learn stable keypoints suitable for higher misalignments. To the best of our knowledge, this is the first work on learning such keypoints directly from 3D point clouds for a general category. Using objects from four benchmark datasets, we demonstrate the quality of our learned keypoints by quantitative and qualitative evaluations. Our experiments also show that the keypoints discovered by our method are geometrically and semantically consistent
    • …
    corecore